Search CORE

6 research outputs found

ViperGPT: Visual Inference via Python Execution for Reasoning

Author: Menon Sachit
Surís Dídac
Vondrick Carl
Publication venue
Publication date: 14/03/2023
Field of study

Answering visual queries is a complex task that requires both visual processing and reasoning. End-to-end models, the dominant approach for this task, do not explicitly differentiate between the two, limiting interpretability and generalization. Learning modular programs presents a promising alternative, but has proven challenging due to the difficulty of learning both the programs and modules simultaneously. We introduce ViperGPT, a framework that leverages code-generation models to compose vision-and-language models into subroutines to produce a result for any query. ViperGPT utilizes a provided API to access the available modules, and composes them by generating Python code that is later executed. This simple approach requires no further training, and achieves state-of-the-art results across various complex visual tasks.Comment: Website: https://viper.cs.columbia.edu

arXiv.org e-Print Archive

Doubly Right Object Recognition: A Why Prompt for Visual Rationales

Author: Mao Chengzhi
Menon Sachit
Sundar Amrutha
Teotia Revant
Vondrick Carl
Wang Xin
Yang Junfeng
Publication venue
Publication date: 12/12/2022
Field of study

Many visual recognition models are evaluated only on their classification accuracy, a metric for which they obtain strong performance. In this paper, we investigate whether computer vision models can also provide correct rationales for their predictions. We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. We find that state-of-the-art visual models, such as CLIP, often provide incorrect rationales for their categorical predictions. However, by transferring the rationales from language models into visual representations through a tailored dataset, we show that we can learn a ``why prompt,'' which adapts large visual representations to produce correct rationales. Visualizations and empirical experiments show that our prompts significantly improve performance on doubly right object recognition, in addition to zero-shot transfer to unseen tasks and datasets

arXiv.org e-Print Archive

Robotic surgery for treatment of chyluria

Author: A Hoznek
E Diamond
F Ciferri
F Köckerling
HJ Lavery
J Brunkwall
J Cortvriend
KN Panicker
M Menon
M Menon
M Salman
MH Sodergren
Michael Palese
Naman Barman
R Peschel
RJ Zagoria
RK Sinha
S Parthasarathy
S Sachit
VJ Panchal
VK Mehta
W Zhuo
X Zhang
XU Zhang
Y Nishiyama
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results

Author: Aadil Muneeb
Ah Namhyuk
Bai Hongliang
Cai Xiaowang
Chaudhury Santanu
Chen Rong
Choi Jae-Seok
Chun Se Young
Damian Alexandru
Deng Wei
Dong Chao
Dong Yuan
Fan Yuchen
Fu Lingzhi
Gool Luc Van
Gu Shuhang
Gu Yanan
Guo Jinkang
Guo Shi
Haris Muhammad
Hu Shijia
Hu Yu Hen
Huang Fang
Huang Thomas S.
Huang Yiwen
Hui Tak-Wai
Hui Zheng
Hussain Sibt Ul
Jeon Seunghyun
Jeon Taegyun
Jiang Xiao
Jing Liting
Kang Byungkon
Ki Sehwan
Kim Jun-Hyuk
Kim Kwanyoung
Kim Munchurl
Kim Saehun
Kim Soo Ye
Koo Jamyoung
Koundinya Sriharsha
Lee Jong-Seok
Li Cuihua
Lin Liang
Liu Hanwen
Liu Jie
Liu Jiye
Liu Pengjv
Liu Yijiao
Loy Chen Change
McWilliams Brian
Menon Sachit
Michelini Pablo Navarrete
Mukhopadhyay Rudrabha
Pang Jiahao
Park Dongwon
Perazzi Federico
Qiu Ming
Qu Yanyun
Rahim Rafia
Ravi Nikhil
Schroers Christopher
Seo Junghoon
Seo Soomin
Shakhnarovich Greg
Sharma Manoj
Shukla Ankit
Sim Hyeonjun
Sohn. Kyung-Ah
Sorkine-Hornung Alexander
Sorkine-Hornung Olga
Timofte Radu
Ukita Norimichi
Upadhyay Avinash
Wang Xinchao
Wang Xintao
Wang Yifan
Wang Ying
Wang Zhaowen
Wu Jiqing
Xiong Fengye
Xu Jinchang
Xu Ning
Xu Xiangyu
Xu Yueshu
Yang Jianchao
Yang Ming-Hsuan
Yijie Webster Bei
Yu Jiahui
Yu Ke
Yuan Yuan
Zeng Jiehang
Zeng Kun
Zhang Jiawei
Zhang Kai
Zhang Lei
Zhang Zhe
Zhao Yan
Zhu Dan
Zuo Wangmeng
Publication venue: IEEE Computer Society
Publication date: 18/06/2018
Field of study

This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus on proposed solutions and results. The challenge had 4 tracks. Track 1 employed the standard bicubic downscaling setup, while Tracks 2, 3 and 4 had realistic unknown downgrading operators simulating camera image acquisition pipeline. The operators were learnable through provided pairs of low and high resolution train images. The tracks had 145, 114, 101, and 113 registered participants, resp., and 31 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution

Crossref

ScholarWorks@UNIST